[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. #22126

sven1977 · 2022-02-04T20:36:16Z

This PR:

Provides a new training_iteration function for A3C (alternative to existing execution_plan).
By default, uses that new iteration function (_disable_execution_plan_api=True).
~3x speedup for tuned_examples/a3c/pong-a3c.yaml (16 worker, LSTM+CNN Atari problem).
As a consequence of this speedup, A3C learns the Pong problem again with an LSTM -> re-instates previously out-commented weekly learning test case (similar to tuned_examples/a3c/pong-a3c.yaml).

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

avnishn · 2022-02-04T20:39:11Z

rllib/agents/a3c/a3c.py

+
+            # Synch updated weights back to the particular worker.
+            with self._timers[SYNCH_WORKER_WEIGHTS_TIMER]:
+                weights = local_worker.get_weights(local_worker.get_policies_to_train())


avnishn · 2022-02-04T20:40:33Z

rllib/agents/a3c/a3c.py

+        if global_vars:
+            local_worker.set_global_vars(global_vars)
+
+        # TODO: If we have processed more than one gradients


so to be clear we haven't written to result in this pr, right? But we want to for logging purposes.

This is still WIP. I need to add proper compilation of the results dict. The only thing that's missing is to combine those learner stats from all workers that have returned something from the async_parallel_requests call further above. This shim implementation right now only returns the last one.
Let me finish this before merging, of course.

avnishn · 2022-02-04T20:41:33Z

Merge pending results dict.

…on instead of `execution_plan`) and re-instate Pong learning test. (#22126)" This reverts commit ac3e6ab.

…on instead of `execution_plan`) and re-instate Pong learning test." (#22250) Reverts #22126 Breaks rllib:tests/test_io

…ad of `execution_plan`) and re-instate Pong learning test. (ray-project#22126)

…on instead of `execution_plan`) and re-instate Pong learning test." (ray-project#22250) Reverts ray-project#22126 Breaks rllib:tests/test_io

sven1977 added 4 commits February 4, 2022 11:37

wip.

54f608c

wip.

7ff2576

Merge branch 'master' of https://github.com/ray-project/ray into test_ac

91a84e3

wip.

a9fcc6f

sven1977 requested review from avnishn and gjoliver as code owners February 4, 2022 20:36

avnishn approved these changes Feb 4, 2022

View reviewed changes

sven1977 added 8 commits February 5, 2022 15:26

test.

dd2dc18

wip

3d3bd00

Merge branch 'master' of https://github.com/ray-project/ray into test_ac

e5418e2

wip

82bd57c

Merge branch 'master' of https://github.com/ray-project/ray into test_ac

bdac9aa

wip

d1758c3

Merge branch 'master' of https://github.com/ray-project/ray into test_ac

a7303fa

wip

89eb20e

avnishn mentioned this pull request Feb 7, 2022

[WIP, RLlib] Prototype of a3c training iteration fn #22094

Closed

6 tasks

sven1977 added 5 commits February 8, 2022 10:12

wip

261710b

LINT.

bd01069

wip

a7b51f8

Merge branch 'master' of https://github.com/ray-project/ray into test_ac

b03ad9e

wip

a3943ea

sven1977 merged commit ac3e6ab into ray-project:master Feb 8, 2022

wuisawesome added a commit that referenced this pull request Feb 9, 2022

Revert "[RLlib] Speedup A3C up to 3x (new training_iteration functi…

880a4ef

…on instead of `execution_plan`) and re-instate Pong learning test. (#22126)" This reverts commit ac3e6ab.

wuisawesome mentioned this pull request Feb 9, 2022

Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test." #22250

Merged

wuisawesome added a commit that referenced this pull request Feb 9, 2022

Revert "[RLlib] Speedup A3C up to 3x (new training_iteration functi…

b122f09

…on instead of `execution_plan`) and re-instate Pong learning test." (#22250) Reverts #22126 Breaks rllib:tests/test_io

simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022

[RLlib] Speedup A3C up to 3x (new training_iteration function inste…

12f0364

…ad of `execution_plan`) and re-instate Pong learning test. (ray-project#22126)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. #22126

[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. #22126

sven1977 commented Feb 4, 2022 •

edited

Loading

avnishn Feb 4, 2022

avnishn Feb 4, 2022

sven1977 Feb 5, 2022

avnishn commented Feb 4, 2022

[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test. #22126

[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test. #22126

Conversation

sven1977 commented Feb 4, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

avnishn Feb 4, 2022

Choose a reason for hiding this comment

avnishn Feb 4, 2022

Choose a reason for hiding this comment

sven1977 Feb 5, 2022

Choose a reason for hiding this comment

avnishn commented Feb 4, 2022

[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. #22126

[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. #22126

sven1977 commented Feb 4, 2022 •

edited

Loading